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DETAILED ACTION 

Claim Rejections - 35 USC § 103 

1 . The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

2. Claims 1, 4, 7, 9, 12, 15-16, 19, 23 and 32 are rejected under 35 U.S.C. 103(a) 
as being unpatentable over Okazaki (US Patent: 5,666,555) in view of Sciammarella 
(US Patent: 6,081,266). 

As to claim 1 , Okazaki discloses a multichannel information processing device 
(i.e. the multiple video signal coming into the system like VTR and LD) (see Fig. 1 , Col. 
2, Lines 41-57) wherein a plurality of video images are displayed simultaneously on a 
display device (i.e. multiple windows on the display screen of the computer displaying 
separate video signals) (see Fig. 1, Col. 2, Lines 58-68), comprising: 

video image information control means (103 CPU) for acquiring information for 
said plurality of video images, and for deciding video image position information relating 
to display positions on a display device (108) for said plurality of video images (i.e. 
images coming from 101 image reproduction apparatus) and outputting said information 
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for a plurality of video images based on said video image position information (i.e. since 
the CPU process the video image information and assign these information on to a 
plurality of windows on the Bit Map Display 108, it must assign a position value for each 
of the windows in order to display the each of the video images correctly) (see Fig. 1 , 
Col. 2, Lines 58-68); 

cursor position control means (103 CPU) for calculating cursor position 
information of a displayed cursor (i.e. the cursor on the screen of the graphic user 
interface that interact with object on screen) based on cursor instructions information 
input via an input device (i.e. key board 111, pointing device 1 05) and generating and 
outputting cursor image information based on said cursor position information (i.e. since 
the cursor is generated by the IOP 104 and controlled and processed by the CPU 103 
when user input devices 105 is activated the cursor must be calculated by the CPU in 
order to allow the position value to be properly accessed and utilized for the selection of 
the various windows) (see Fig. 1 , Col. 3, Lines 5-12); 

display image generating means (106) for synthesizing information for the 
plurality of video images output by said video image display control means (103) and 
cursor image information output by the cursor position control means (103) and 
displaying the same on said display device (i.e. since the multiple video images are 
display on the Bit Map Display 108 in forms of multiple windows and each of the window 
can be selected by the cursor to activate its audio content, the video images are 
successfully synthesized and displayed) (see Fig 1, Col 2, Lines 41-68) ; 
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distance information generating means (CPU 103) for calculating distances 
between the display positions of said plurality of video images and a cursor display 
position based center position information of said plurality of video images and center 
position information of the displayed cursor, and generating distance information (i.e. 
since each of the windows are selectable by the cursor to activate its audio content the 
and the computer uses the video image location which is based on the center position 
of the same image object to determine the placement of the cursor in relation to the 
video image so that when the cursor is within a said distance the volume is increased) 
(see Fig. 1, Col. 3, Lines 5-12); 

and audio output control means (i.e. Audio Selector 102 ) for deciding volume of 
audio data for said plurality of video images based on the distance information 
generated by said distance information generating means, and outputting audio data to 
an output device(i.e. since when the distance is zero, where the cursor is found to be 
inside the window, then the corresponding audio of the video in the window is outputted 
at a preset volume, and when the distance is one, where the cursor is not inside the 
window the volume is zero and the audio is not outputted) (see Fig. 1 , Col. 3, Lines 5- 
12). 

However, Okazaki does not explicitly teach wherein said audio output control 
means sets volume of said audio data to one of multiple values so as to be in inverse 
proportion to distance values generated by said distance information generating means, 
synthesizes said audio data corresponding to said plurality of video images displayed by 
said display image generating means, using said respective volumes, and outputs said 
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synthesized audio data. Sciammarella teaches wherein said audio output control 
means sets volume (i.e. the volume of the individual channel displayed on the screen) 
of said audio data (i.e. audio data actual implemented by the sound system) to one of 
multiple values so as to be in inversely proportion to distance values generated by said 
distance information generating means, synthesizes said audio data corresponding to 
said plurality of video images (i.e. since the image size of the icon representing the 
sound source which could be a video when expanded, when means the center of the 
icon is farther away compared to a dragging cursor of the mouse as the volume expand 
and the expansion or shrinkage of the icon when operating the cursor is a matter of GUI 
design setting the proportionality of the distance is therefore part of the design) (see Fig. 
5) displayed by said display image generating means (12), using said respective 
volumes, and outputs said synthesized audio data (i.e. the computer representation of 
the sound source has volume determined by the size of the image which is determined 
by the mouse cursor control) (see Sciammarella, Fig. 5-7, Col. 4, Lines 34-66). 

Therefore it would have been obvious for one of ordinary skill in the art at the 
time the invention was made to have used the proportionality virtual space based audio 
processing system of Sciammarella in the computing environment of Okazaki and use 
the inverse proportion distance to control the volume in order to provide simpler user 
interface for the user to control the audio property (see Sciammarella, Col.1 Lines 23- 
28). 
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As to claim 16, Okazaki teaches a computer-readable recording medium (110) 
storing a program controlling a computer having a display device (108), an input device 
(111) and an audio output device (1 06) to execute a multichannel information 
processing for displaying a plurality of video images simultaneously on the display 
device (i.e. the multiple windows on the screen each having video images displayed 
coming from the disk which store image data and audio data) (see Fig. 4, Col. 4, Lines 
14-41), according to operations comprising: 

deciding display positions on the display device for said video images to be 
displayed (i.e. since the computer CPU 103 process the video image data and output 
the data on to a plurality of display windows it necessary decide the positions that the 
video image takes up in order to properly display it on bit map display 108) (see Fig. 4, 
Col. 4, Lines 14-26); 

outputting information for said plurality of video images based on the decided 
display positions (i.e. the CPU 103 sent the video images into the various windows on 
the bitmap display 108 based on the display address assigned) (see Fig. 4, Col. 4, 
Lines 14-34); 

accepting cursor instructions information input from said input device (pointing 
device 105) (i.e. the pointing device 105 inputting the pointer information via the IOP 
104 to the CPU 103) (see Fig. 4, Col. 4, Lines 47-58); 

calculating cursor position information for displaying a cursor based on said 
cursor instructions information (i.e. the location of the pointer on the bitmap display 108 
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must be calculated to update the operation of the pointing device 105) (see Fig. 4, Col. 
4, Lines 47-58); 

generating cursor image information based on said cursor position information 
(i.e. the image of the pointer on the bitmap display 108 is outputted therefore it must be 
generated after the updates is made on the operations of the pointing device 105) (see 
Fig. 4, Col. 4, Lines 47-58); 

synthesizing information for said plurality of video images and said cursor image 
information, generating a display image, and displaying the display image on said 
display device (i.e. the outputted screen synthesized image data is output as an image 
signal to the bit map display 108 via the D/A converter 305, and since the pointer is 
present is must be synthesized also) (see Fig. 4, Col. 4, Lines 20-27); 

calculating distances between the display positions of said plurality of video 
images and the display position of said cursor and generating distance information (i.e. 
since the distance of the selected windows is zero (i.e. selection) for turning on, when 
the cursor is outside the parameter of a window it has a distance of one (i.e. non- 
selection), which is calculated based upon this position of the windows and the current 
pointer position) (see Fig. 4, Col. 4, Lines 46-63); 

and deciding volume of audio data for said plurality of video images based on 
said distance information and outputting the audio data to the audio output device, 
wherein the deciding of the volume of the audio data (i.e. the volume of the audio of the 
video is selected based upon the cursor, where the window that is selected has the 
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nominal value and the non-selected has a muted volume) (see Fig. 4, Col. 4, Lines 46- 
63). 

However, Okazaki does not explicitly teach wherein said audio output control 
means sets volume of said audio data to one of multiple values so as to be in inverse 
proportion to distance values generated by said distance information generating means, 
synthesizes said audio data corresponding to said plurality of video images displayed by 
said display image generating means, using said respective volumes, and outputs said 
synthesized audio data. Sciammarella teaches wherein said audio output control 
means sets volume (i.e. the volume of the individual channel displayed on the screen) 
of said audio data (i.e. audio data actual implemented by the sound system) to one of 
multiple values so as to be in inversely proportion to distance values generated by said 
distance information generating means, synthesizes said audio data corresponding to 
said plurality of video images (i.e. since the image size of the icon representing the 
sound source which could be a video when expanded, when means the center of the 
icon is farther away compared to a dragging cursor of the mouse as the volume expand 
and the expansion or shrinkage of the icon when operating the cursor is a matter of GUI 
design setting the proportionality of the distance is therefore part of the design) (see Fig. 
5) displayed by said display image generating means (12), using said respective 
volumes, and outputs said synthesized audio data (i.e. the computer representation of 
the sound source has volume determined by the size of the image which is determined 
by the mouse cursor control) (see Sciammarella, Fig. 5-7, Col. 4, Lines 34-66). 
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Therefore it would have been obvious for one of ordinary skill in the art at the 
time the invention was made to have used the proportionality virtual space based audio 
processing system of Sciammarella in the computing environment of Okazaki and use 
the inverse proportion distance to control the volume. 

As to claim 19, note the discussion of claim 16 above, claim 19 differs form claim 
16 only in the limitation of: generating direction information relating to direction of 
display position for each video image as seen from cursor display position; and 
outputting to said audio output device so that audio data corresponding to said plurality 
of video images is positioned at acoustic image positions in the sound space of said 
audio output device in accordance with said distance information and said direction 
information. 

Okazaki teaches generating direction information relating to direction of display 
position for each video image as seen from cursor display position (i.e. since the pointer 
value that is represented on the display 108 is in two direction x and y the video also 
has those direction since that is how a bitmap display receive the data and display it on 
screen) (see Fig. 4, Col. 4, Lines 46-63); 

and outputting to said audio output device so that audio data corresponding to 
said plurality of video images is positioned at acoustic image positions in the sound 
space of said audio output device (speaker 108) in accordance with said distance 
information and said direction information (i.e. since the output the center selected 
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audio data based on the pointer selection, it is in the sound space of the speaker as the 
data values that are outputted there) (see Fig. 4, Col. 4, Lines 46-63). 



As to claim 23, note the discussion of claim 16 above, claim 23 is broader in 
scope than claim 16 and is rejected on the same ground. 

As to claim 32, not the discussion of claim 16 above, claim 32 differs from claim 
16 only in that claim 16 is a method claim and claim 32 is an apparatus claim and is 
regarded as previously discussed with respect to claim 16 above. 

As to claim 4, Okazaki teaches a multichannel information processing device 
according to claim 1, wherein distance information generated by said distance 
information generating means (103) includes direction information relating to direction of 
video image display position as seen from cursor display position (i.e. since the 
windows on the screen have a two dimensional outlay x and y by which an input cursor 
is placed, the direction are accounted for during the calculation for the positional 
information of the window as compared to the cursor), and said audio output control 
means (102) makes output to an audio output device based on said distance 
information, so that audio data for said plurality of video images is positioned in the 
sound space formed by said audio output device (i.e. since each of the windows are 
selectable by the cursor to activate its audio content the distance is therefore one or 
zero where if the cursor was found to be within the window it is zero and the audio is 
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outputted and when the cursor is outside the window then the distance is 1 and no 
audio is outputted. Also the audio signal of any windows must be in the sound space of 
the speaker 109 when it is outputted since it is the only audio output means in the 
system) (see Fig.1, Col. 3, Lines 5-12). 

As to claim 7, Okazaki teaches multichannel information processing device 
according to claim 1 , further including video image selecting means for selecting, based 
on a prescribed algorithm (i.e. since the video image reside in the windows on the 
screen and the window is selected by the cursor, an algorithm is used to determine the 
window that is selected and therefore the video that is selected), a specified video 
image from among a plurality of video images displayed on said display device, wherein 
said audio output control means outputs to an audio output device audio data for the 
video image selected by said video image selecting means (i.e. since the audio and 
video signal are presented together when the window is selected, both the audio and 
video are output upon selection) (see Fig. 1, Col. 3, Lines 5-12). 

As to claims 9, 1 2 and 1 5, these claims differ from claims 1 , 4 and 7 only in that 
claim 1, 4 and 7 are apparatus claims, whereas claim 9, 12 and 15 are method claims. 
Thus, claims 9-12 and 15 are analyzed as previously discussed with respect to claims 
1-4 and 7 above. 
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3. Claims 20-21 , 5 and 1 3 are rejected under 35 U.S.C. 1 03(a) as being 
unpatentable overOkazaki in view of Sciammarella as applied to claims 1, 4, 7, 9, 12, 
15-16, 19, 22-23 and 32, further in view of Yamagami (U.S. Patent 6,334,025). 



As to claim 20, note the discussion of claim 16 above, Okazaki and Sciammarella 
does not teach a step for voice-recognizing words included in audio data for said 
plurality of video images; a step for converting voice-recognized words into character 
data and outputting the same; 

Yamagami teaches a step for voice-recognizing words included in audio data for 
said plurality of video images (i.e. the CPU 13 execute the audio recognition causing the 
result to be displayed in the display section 402); a step for converting voice-recognized 
words into character data and outputting the same (i.e. the CPU 13 execute the audio 
recognition causing the result to be displayed in the display section 402) (see Fig. 4, 9, 
Col. 10, Lines 5-23, Col. 12, Line 60 - Col. 13, Line 10); 

Therefore, it would have been obvious for one of ordinary skill in the art at the 
time the invention was made to combine the voice recognition capability of Yamagami 
to the multi-window display of Okazaki in order to allow a more efficient storage of the 
annotation of audio and video data (Yamagami, Col. 1, Lines 65-68). 

As to claim 21 , note the discussion of claim 1 6 and claim 20 above, claim 21 
differs from claim 20 only in the addition of two addition steps: 
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calculating distance between the display position positions of said plurality of 
video images and said cursor position information and generating distance information; 

selecting a specified video image from among the plurality of video images based 
on said distance information and outputting audio data of the selected video image to 
the audio output device; 

Okazaki teaches calculating distances between the display position of said 
plurality of video images and display position of said cursor and generating distances 
information (i.e. since the distance of the selected windows is zero (i.e. selection) for 
turning on, when the cursor is outside the parameter of a window it has a distance of 
one (i.e. non-selection), which is calculated based upon this position of the windows and 
the current pointer position) (see Fig. 4, Col. 4, Lines 46-63); selecting a specified video 
image from among the plurality of video images based on said distance information and 
outputting audio data of the selected video image to the audio output device (i.e. the 
audio data and the video data of the selected window is output to the bitmap display 
108 and the speaker 109 based on the distance zero which is when the cursor is 
actually in the window selected) (see Fig. 4, Col. 4, Lines 46-63). 

As to claim 5, Okazaki teaches a multichannel information processing device 
according to claim 1 , but does not teach voice data recognition means for recognizing 
words included in audio data for said plurality of video images, and character 
information display means for converting words recognized by said voice data 
recognition means into character data and displaying the same on said display device. 
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Yamagami teaches voice data recognition means (13 CPU) for recognizing 
words included in audio data for said plurality of video images, and character 
information display means (13 CPU) for converting words recognized by said voice data 
recognition means into character data and displaying the same on said display device 
(i.e. the CPU 13 execute the audio recognition causing the result to be displayed in the 
display section 402) (see Fig. 4, 9, Col. 10, Lines 5-23, Col. 12, Line 60 - Col. 13, Line 
10). 

Therefore, it would have been obvious for one of ordinary skill in the art at the 
time the invention was made to combine the voice recognition capability of Yamagami 
to the multi-window display of Okazaki. 

As to claim 13, this claim differs from claim 5 only in that claim 5 is an apparatus 
claim, whereas claim 13 is a method claim. Thus, claim 13 is analyzed as previously 
discussed with respect to claim 5 above. 

4. Claims 22, 6, and 14 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Okazaki in view of Sciammarella and further in view of Yamagami as applied to 
claims 20-21, 5 and 13 above, and further in view of Hilpert, Jr. et al. (U.S. Patent 
6,469,712) 
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As for claim 22, note the discussion of claim 20 above, Yamagami does not 
explicitly teach a step for connecting to the Internet; a step for searching for related web 
sites on the Internet using a voice-recognized word as keyword; 

Hilpert teaches a step for connecting to the Internet (i.e. Internet access 
program); a step for searching for related web sites (i.e. Net Search) on the Internet 
using key word (i.e. since the web browser is an extension to the conventional process 
data capability of an individual personal computer, it is natural to use the web search 
capability to enhance the operation of the computer) (Col. 3, Line 50 - Col. 4, Line 50). 

Therefore, it would have been obvious for one of ordinary skill in the art at the 
time the invention was made to combine the web searching capability of Hilpert to the 
text recognition design of Yamagami, (i.e. having the text recognition process to be 
further enhanced by use of Internet search for more detailed information) in order to 
provide the user additional interactions with the images displayed to assist visually 
impaired users (Hilpert Col. 1 , Lines 50-66). 

As to claim 6, note the discussion of Claim 5, Yamagami does not explicitly teach 
Internet connection means, web site search means for searching for related web sites 
on the Internet and web site display means for displaying on said display device a web 
site found by said web site search means. 

Hilpert teaches Internet connection means (i.e. Internet access program), web 
site search means (i.e. Net Search) for searching for related web sites on the Internet 
and web site display means for displaying on said display device a web site found by 
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said web site search means (i.e. since the web browser is an extension to the 
conventional process data capability of an individual personal computer, it is natural to 
use the web search capability to enhance the operation of the computer) (Col. 3, Line 
50 -Col. 4, Line 50). 

Therefore, it would have been obvious for one of ordinary skill in the art at the 
time the invention was made to combine the web searching capability of Hilpert to the 
text recognition design of Yamagami (i.e. having the text recognition process to be 
further enhanced by use of Internet search for more detailed information). 

As to claim 14, this claim differs from claim 6 only in that claim 6 is an apparatus 
claim, whereas claim 14 is a method claim. Thus, claim 14 is analyzed as previously 
discussed with respect to claim 6 above. 

5. Claim 8 is rejected under 35 U.S.C. 103(a) as being unpatentable over Okazaki 
in view of Sciammarella and further in view of Tarabella (U.S. Patent 5,796,945). 

As to claim 8, note the discussion of Claim 7, Okazaki does not explicitly teach 
video image selecting means switches to a different video image for selection whenever 
a prescribed length of time has passed. 

Tarabella teaches video image selecting means switches to a different video 
image for selection whenever a prescribed length of time has passed (i.e. the video clip 



Application/Control Number: 10/669,508 Page 17 

Art Unit: 2629 

that is capable of being displayed can be controlled for the length of time that it is to be 
displayed before a change is to take place (see Col. 5, Lines 1-53). 

Therefore, it would have been obvious for one of ordinary skill in the art at the 
time the invention was made to combine the time based pre-set selection capability of 
Tarabella to the image selection system of Okazaki, in order to make the computer 
display more productive during idle time (see Tarabella, Col. 2, Lines 5-13). 

Response to Arguments 

6. Applicant's arguments with respect to claim have been considered but are moot 
in view of the new ground(s) of rejection. 

Inquiry 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to CALVIN C. MA whose telephone number is (571 )270- 
1713. The examiner can normally be reached on 7:30-5:00. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Chanh Nguyen can be reached on 571-272-7772. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



Calvin Ma 
January 4, 2008 



/Chanh Nguyen/ 

Supervisory Patent Examiner, Art 

Unit 2629 



